ggplot2Main packages used:
ggplot2,ggraph
Main functions covered:ggplot(),geom_*(),scale_*_*(),labs(),theme_*()
Supplementary resources:
- Data Visualization - A practical introduction from Kieran Healey.
- Fundamentals of Data Visualization by Claus O. Wilke
- R Graphics Cookbook by Winston Chang
- ggplot2 cheat sheet
- list of ggplot2 extensions
Load our packages.
# loading and shaping data
library(readr)
library(dplyr)
library(haven)
# data sources
library(gapminder)
library(palmerpenguins)
library(eurostat)
library(maps)
# general data visualisation
library(ggplot2)
library(ggridges)
library(ggthemes)
# network related packages
library(ggraph)
library(tidygraph)Minimize noise, maximize signal in your graphs (or put it in other ways: maximize the data-ink ratio):
source: Darkhorse Analytics
Appropriate reaction to 3D charts:
ggplot2 and its extensionsThe name stands for grammar of graphics and it enables you to build your plot layer by layer and having the ability to control every detail of the output (if you so wish). It is used by many in academia, by the Financial Times and FiveThirtyEight writers, among many others. During this workshop we will go through various types of data visualizations and try to apply the above set principles to our output.
You create plots with the below syntax:
Let’s load our data that we’ll be using this session. (don’t worry about the message about the duplicated column names for now)
# data
penguins_df <- palmerpenguins::penguins
gapminder_df <- gapminder
ess_hun <- read_csv("data/ESS_Hun_7.csv")
oecd_sum <- read_csv("data/oecd_sum.csv")
stocks <- read_csv("data/stocks.csv")We now have some experience with making nice figures with ggplot2. To kickstart this session, let’s review how a plot is made and extend our knowledge on how to fine tune elements of the plot. This section was inspired by the great RLadies presentation of Eva Maerey
We will use the Palmerson Penguins dataset, that contains data for 344 penguins. There are 3 different species of penguins in this dataset, collected from 3 islands in the Palmer Archipelago, Antarctica. The original publication of this data is in Gorman, Williams & Fraser (2014).
Artwork by allison_horst
First, we specify the data we want to use within our ggplot() function call with the data = argument.
Second, we decide on the dimensions of our data. Let’s start by specifying what to plot on the y and x axes. This is done within the aes() argument, which stands for ‘aesthetic’.
Third, we add our wanted representation of the data, with the geom_ function family.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm)) +
geom_point()
#> Warning: Removed 2 rows containing missing values (geom_point).Fourth, we can add further dimension to our plot by extending the aes() arguments. Let’s add colors based on the species variable.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point()
#> Warning: Removed 2 rows containing missing values (geom_point).Fifth, each aesthetic can be rescaled. Now we want to rescale our colors. We will use the manual color scale to specify each value. Colors can be added as HEX code, or names.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
scale_color_manual(values = c("#7fc97f", "blue", "#fdc086"))
#> Warning: Removed 2 rows containing missing values (geom_point).Sixth, we can modify the textual elements of our plot. To do this, we can assign a string to every text element with the labs function. As we see the color aesthetic created automatically a legend on the side. We can remove the title of it should we want it.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
scale_color_manual(values = c("#7fc97f", "blue", "#fdc086")) +
labs(title = "Penguins, their mass and their flippers",
subtitle = "The positive relationship between body mass and flipper size",
caption = "Data: Gorman, Williams & Fraser (2014)",
x = "Body mass (g)",
y = "Flipper length (mm)",
color = "")
#> Warning: Removed 2 rows containing missing values (geom_point).Finally, we decide on the theme of our hearts. ggplot2 offers an ocean of customization options for our plot, there are some pre-made themes but we can create our own as well. Now we will stick to theme_minimal().
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
scale_color_manual(values = c("#7fc97f", "blue", "#fdc086")) +
labs(title = "Penguins, their mass and their flippers",
subtitle = "The positive relationship between body mass and flipper size",
caption = "Data: Gorman, Williams & Fraser (2014)",
x = "Body mass (g)",
y = "Flipper length (mm)",
color = "") +
theme_minimal()
#> Warning: Removed 2 rows containing missing values (geom_point).We use scatter plot to illustrate some association between two continuous variable. Usually, the y axis is our dependent variable (the variable which is explained) and x is the independent variable, which we suspect that drives the association.
Now, we want to know what is the association between the GDP per capita and life expectancy
Now that we have a basic figure, let’s make it better. We transform the x axis values with the scale_x_log10() and add text to our plot with the labs() function. Within geom_point() we can also specify geom specific options, such as the alpha level (transparency).
ggplot(data = gapminder_df,
mapping = aes(x = gdpPercap,
y = lifeExp)) +
geom_point(alpha = 0.25) + # inside the geom_ we can modify its attributes. Here we set the transparency levels of the points
scale_x_log10() + # rescale our x axis
labs(x = "GDP per capita",
y = "Life expectancy",
title = "Connection between GDP and Life expectancy",
subtitle = "Points are country-years",
caption = "Source: Gapminder")To add some analytical power to our plot we can use geom_smooth() and choose a method for it’s smoothing function. It can be lm, glm, gam, loess, and rlm. We will use the linear model (“lm”). Note: this is purely for illustrative purposes, as our data points are country-years, so “lm” is not a proper way to fit a regression line to this data. This example also shows how to plot two geoms into one figure.
ggplot(data = gapminder_df,
mapping = aes(x = gdpPercap,
y = lifeExp)) +
geom_point(alpha = 0.25) +
geom_smooth(method = "lm", se = TRUE, color = "orange") + # adding the regressiom line
scale_x_log10() +
labs(x = "GDP per capita",
y = "Life expectancy",
title = "Connection between GDP and Life expectancy",
subtitle = "Points are country-years",
caption = "Source: Gapminder")what if we want to see how each continent fares in this relationship? We need to include a new argument in the mapping function: color =. Now it is clear that European countries (country-years) are clustered in the high-GDP/high life longevity upper right corner.
ggplot(data = gapminder_df,
mapping = aes(x = gdpPercap,
y = lifeExp,
color = continent)) + # color by category
geom_point(alpha = 0.5) +
scale_x_log10() + # rescale our x axis
labs(x = "GDP per capita",
y = "Life expectancy",
title = "Connection between GDP and Life expectancy",
subtitle = "Points are country-years",
caption = "Source: Gapminder")We add horizontal line or vertical line to our plot, if we have a particular cutoff that we want to show. We can add these with the geom_hline() and geom_vline() functions.
ggplot(data = gapminder_df,
mapping = aes(x = gdpPercap,
y = lifeExp,
color = continent)) + # color by category
geom_point(alpha = 0.5) +
scale_x_log10() +
geom_vline(xintercept = 3500) + # adding vertical line
geom_hline(yintercept = 70, linetype = "dashed", color = "black", size = 1) + # adding horizontal line
labs(x = "GDP per capita",
y = "Life expectancy",
title = "Connection between GDP and Life expectancy",
subtitle = "Points are country-years",
caption = "Source: Gapminder")Using histograms to check the distribution of the data as we have seen in the intro sessions.
To add some flair to our figure, we use color and fill inside the geom_ call. What is the difference between the two?
ggplot(gapminder_df,
mapping = aes(x = lifeExp)) +
geom_histogram(binwidth = 1, color = "black", fill = "orange") # we can set the colors and border of the bars and set the binwidth or bins A variation on histograms is called density plots that uses Kernel smoothing (fancy! but in reality is a smoothing function which uses the weighted averages of neighboring data points.)
ggplot(penguins_df,
mapping = aes(x = body_mass_g)) +
geom_density()
#> Warning: Removed 2 rows containing non-finite values (stat_density).Add some fill
ggplot(penguins_df,
mapping = aes(x = body_mass_g)) +
geom_density(fill = "orange", alpha = 0.3)
#> Warning: Removed 2 rows containing non-finite values (stat_density).Your intuition is correct, we can overlap this with our histogram. To keep the y axis consistent between the histogram and the density plot, we use the ..density.. term for the geom_histogram to avoid having the frequency on the y axis.
ggplot(penguins_df,
mapping = aes(x = body_mass_g)) +
geom_histogram(aes(y = ..density..),
binwidth = 100,
fill = "white",
color = "black") +# we add this so the y axis is density instead of count.
geom_density(alpha = 0.25, fill = "orange")
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
#> Warning: Removed 2 rows containing non-finite values (stat_density).And similarly to the histogram, we can overlay two or more density plot as well.
ggplot(penguins_df,
mapping = aes(x = body_mass_g,
fill = species)) +
geom_density(alpha = 0.5)
#> Warning: Removed 2 rows containing non-finite values (stat_density).This one is quite spectacular looking and informative. It has a similar function as the overlayed histograms but presents a much clearer data. For this, we need the ggridges package which is a ggplot2 extension.
ggplot(data = penguins_df,
mapping = aes(x = body_mass_g,
y = species,
fill = species)) +
geom_density_ridges(scale = 0.8, alpha = 0.5)
#> Warning: Removed 2 rows containing non-finite values (stat_density_ridges).We can use the bar charts to visualize categorical data. Let’s prep some data. (for refresher, check the first session on factors!) For diversifying our approaches for educational purposes this recoding is done in base R but we could have done it in dplyr as well.
ess_hun$gndr <- factor(ess_hun$gndr, labels = c("Male", "Female"))
ess_hun$polintr <- factor(ess_hun$polintr, labels = c("Very interested", "Quite interested", "Hardly interested", "Not at all interested", "Refusal", "Don't know"), ordered = TRUE)
ess_hun$essround <- factor(ess_hun$essround, ordered = TRUE)Let’s see the political interest of the Hungarian people.
We can use the fill option to map another variable onto our plot. Let’s see how these categories are further divided by the gender of the respondents. By default we get a stacked bar chart.
we can use the position function in the geom_bar to change this. Another neat trick to make our graph more readable is coord_flip.
Let’s make sure that the bars are proportional. For this we can use the y = ..prop.. and group = 1 arguments, so the y axis will be calculated as proportions. The ..prop.. is a temporary variable that has the .. surrounding it so there is no collision with a variable named prop.
ggplot(ess_hun, aes(polintr, fill = gndr)) +
geom_bar(position = "dodge",
aes(y = ..prop.., group = gndr)) +
coord_flip()Combining categorical data and continuous data and using group by is also doable. We just create a grouped data and have the needed variables computed, then plot it.
cont_sum <- gapminder %>%
group_by(continent) %>%
summarise(n = n(),
life_exp = mean(lifeExp, na.rm = TRUE))
ggplot(cont_sum, aes(continent, life_exp)) +
geom_bar(stat = "identity") +
coord_flip()The lollipop chart is a better barchart in a sense that it conveys the same information with better data/ink ratio. It also looks better. (note: some still consider it a gimmick)
For this we will modify a chart from the Data Visualisation textbook
This chart is built in a more complex way as we have to draw the lines and the dots separately. We draw the lines with the geom_segment that requires a starting value and ending value for both the x and y axis. The dots are drawn with the geom_point and the colors are from a dummy variable in the dataset.
# for the data see the github repository of the workshop
ggplot(data = oecd_sum,
mapping = aes(x = year, y = diff, color = hi_lo)) +
geom_segment(aes(y = 0, x = year, yend = diff, xend = year)) +
geom_point() +
theme(legend.position="none") +
labs(x = NULL, y = "Difference in Years",
title = "The US Life Expectancy Gap",
subtitle = "Difference between US and OECD average life expectancies, 1960-2015",
caption = "Adapted from Kieran Healy's Data Visualisation (2019), fig.4.21 ")ggplot(data = penguins_df,
mapping = aes(x = species,
y = body_mass_g)) +
geom_boxplot()
#> Warning: Removed 2 rows containing non-finite values (stat_boxplot).We add color coding to our boxplots as well.
ggplot(data = penguins_df,
mapping = aes(x = species,
y = body_mass_g,
fill = species)) +
geom_boxplot(alpha = 0.5)
#> Warning: Removed 2 rows containing non-finite values (stat_boxplot).ggplot(data = penguins_df,
mapping = aes(x = species,
y = body_mass_g)) +
geom_violin()
#> Warning: Removed 2 rows containing non-finite values (stat_ydensity).For this we use data on stock closing prices. As we are now familiar with the ggplot2 syntax, I do not write out all the data = and mapping =.
Add some refinements.
ggplot(stocks, aes(date, stock_closing, color = company)) +
geom_line(size = 0.7) +
labs(x = "", y = "Prices (USD)",
title = "Closing daily prices for selected tech stocks",
subtitle = "Data from 2016-01-10 to 2018-01-10",
caption = "source: Yahoo Finance")Faceting (or creating small multiples) is an excellent way to declutter our plot.
ggplot(stocks, aes(date, stock_closing, color = company)) +
geom_line(size = 1) +
labs(x = "", y = "Prices (USD)",
title = "Closing daily prices for selected tech stocks",
subtitle = "Data from 2016-01-10 to 2018-01-10",
caption = "source: Yahoo Finance") +
facet_wrap(~company, nrow = 4)QUICK EXCERCISE: Create a plot from the gapminder datase, where: you show the distribution of life expectancy for each continent. Use
facet_wraporfacet_grid
There comes a moment when you need to create a pie chart. Here is an example of one acceptable use case of this technique.
Our data:
We create a custom theme as a pie chart deserves all the attention we can give to it:
theme_pyramid <- function() {
theme_minimal() %+replace%
theme(
axis.line=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.grid=element_blank(),
legend.title=element_blank()
)
}And we plot:
ggplot(pyramid, aes(x = "", y = picture_objects, fill = labels)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("#0095D9","#F5E837", "#C4B730"), breaks = c("Sky", "Sunny side", "Shady side")) +
coord_polar("y", start = 8.9) +
theme_pyramid()In this section we will go over some of the elements that you can modify in order to get an informative and nice looking figure. ggplot2 comes with a number of themes. You can play around the themes that come with ggplot2 and you can also take a look at the ggthemes package, where I included the economist theme. Another notable theme collection is the hrbthemes package. The BBC also published their R package which they use to create their graphics. You can find it on GitHub here: https://github.com/bbc/bbplot
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 rows containing missing values (geom_point).
#> Warning: Removed 1 rows containing missing values (geom_point).
Try out a couple to see what they differ in! The ggthemes package has a nice collection of themes to use. The theme presets can be used with the theme_*() function.
One of my personal favorite is the theme_minimal()
ggplot(data = gapminder_df,
mapping = aes(x = gdpPercap,
y = lifeExp)) +
geom_point(alpha = 0.25) +
scale_x_log10() +
theme_minimal() # adding our chosen themeOf course we can set all elements to suit our need, without using someone else’s theme.
The key plot elements that we will look at are:
Adding labels, title, as we did before. For this example we will use the same plot that we created at the beggining of the session.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
labs(title = "Penguins, their mass and their flippers",
subtitle = "The positive relationship between body mass and flipper size",
caption = "Data: Gorman, Williams & Fraser (2014)",
x = "Body mass (g)",
y = "Flipper length (mm)",
color = "")
#> Warning: Removed 2 rows containing missing values (geom_point).Let’s use a different color scale! We can use a color brewer scale (widely used for data visualization). To check the various palettes, see http://colorbrewer2.org
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
scale_color_brewer(name = "Species", palette = "Set2") # adding the color brewer color scale
#> Warning: Removed 2 rows containing missing values (geom_point).Or we can define our own colors:
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species)) +
geom_point() +
scale_color_manual(values=c("coral3", "deepskyblue3", "orange")) # adding our manual color scale
#> Warning: Removed 2 rows containing missing values (geom_point).To make each species more distinc we can use different shapes as well.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species,
shape = species)) +
geom_point()
#> Warning: Removed 2 rows containing missing values (geom_point).To give a brief glimpse how customizable a ggplot2 figure is, we are going to modify some theme elements and change the look of our plot drastically to the better.
To clean up clutter, we will remove the background, and only leave some of the grid behind: - We can hide the tickmarks with modifying the theme() function, and setting the axis.ticks to element_blank(). - Hiding gridlines also requires some digging in the theme() function with the panel.grid.minor or .major functions. - If you want to remove a gridline on a certain axis, you can specify panel.grid.major.x. We can also set the background to nothing. - We can modify the gridline within the element_line() function. - As a final touch, the axis line is also recolored
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species,
shape = species)) +
geom_point() +
theme(axis.ticks = element_blank(), # removing axis ticks
panel.grid.major.y = element_line(color = "grey", size = 0.1, linetype = "dotted"), # recoloring the panel grids
panel.background = element_blank(),
axis.line = element_line(colour = "grey")) # removing the background
#> Warning: Removed 2 rows containing missing values (geom_point).Finally, let’s move the legend around. Or just remove it with theme(legend.position="none"). We also do not need the background of the legend, so remove it with legend.key, and play around with the text elements of the plot with text.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species,
shape = species)) +
geom_point() +
theme(axis.ticks = element_blank(), # removing axis ticks
panel.grid.major.y = element_line(color = "grey", size = 0.1, linetype = "dotted"), # recoloring the panel grids
panel.background = element_blank(),
axis.line = element_line(colour = "grey"),
legend.title = element_text(size = 12), # setting the legends text size
text = element_text(face = "plain", family = "sans"), # setting global text options for our plot
legend.key=element_blank(),
legend.position = "bottom")# removing the background
#> Warning: Removed 2 rows containing missing values (geom_point).In the final version, we add a custom color palette, the labels and all the modification we made so far.
ggplot(data = penguins_df,
aes(x = body_mass_g,
y = flipper_length_mm,
color = species,
shape = species)) +
geom_point() +
labs(title = "Penguins, their mass and their flippers",
subtitle = "The positive relationship between body mass and flipper size",
caption = "Data: Gorman, Williams & Fraser (2014)",
x = "Body mass (g)",
y = "Flipper length (mm)",
color = "") +
scale_color_brewer(name = "Species", palette = "Set2") +
scale_shape(guide = FALSE) +
theme(axis.ticks = element_blank(), # removing axis ticks
panel.grid.major.y = element_line(color = "grey", size = 0.1, linetype = "dotted"), # recoloring the panel grids
panel.background = element_blank(),
axis.line = element_line(colour = "grey"),
legend.title = element_text(size = 12), # setting the legends text size
text = element_text(face = "plain", family = "sans"), # setting global text options for our plot
legend.key=element_blank(),
legend.position = "bottom")# removing the background
#> Warning: Removed 2 rows containing missing values (geom_point).While we are at it, we want to have labels for our data. For this, we’ll create a plot which can exploit this. Let’s return to the gapminder data for this exercise. To have a manageable sized number of observations we are going to filter the data somewhat.
What we use is the geom_text to have out labels in the chart.
gapminder_small <- gapminder_df %>%
filter(lifeExp >= 72.5, gdpPercap >= 10000, continent == "Europe", year == 2002)
ggplot(gapminder_small, aes(lifeExp, gdpPercap, label = country)) + # we add the labels!
geom_point() +
geom_text()To avoid overlapping text, use the ggrepel package which provides this functionality via the ggrepel::geom_text_repel and the ggrepel::geom_label_repel functions.
ggplot(gapminder_small, aes(lifeExp, gdpPercap, label = country)) +
geom_point() +
ggrepel::geom_text_repel()Without
notice the different outcome of geom_label instead of geom_text.
ggplot(gapminder_small, aes(lifeExp, gdpPercap, label = country)) + # we add the labels!
geom_point() +
geom_label() # and use the geom labelIf we want to label a specific set of countries we can do it from inside ggplot, without needing to touch our data.
ggplot(gapminder_df, aes(gdpPercap, lifeExp, label = country)) + # we add the labels!
geom_point(alpha = 0.25) +
geom_text(aes(label = if_else(gdpPercap > 60000, country, NULL))) # we add a conditional within the geom. Note the nudge_x!
#> Warning: Removed 1699 rows containing missing values (geom_text).QUICK EXCERCISE: Choose a dataset that we’ve used in this session so far, and create a plot where: 1) You modify the color and size of the geom 2) You add your own labels to the plot 3) Modify at least one theme element
Let’s load our data from an edgelist. We are using the tidygraph ggraph packages, but both are heavily dependent on the igraph package which is one of the most powerful one for network analysis in R.
# data
edges_got <- read_csv("https://raw.githubusercontent.com/melaniewalsh/sample-social-network-datasets/master/sample-datasets/game-of-thrones/got-edges.csv")let’s create the network object and add some network statistics to our small social network
soc_nw <- as_tbl_graph(edges_got, directed = FALSE) %>%
activate(nodes) %>%
mutate(centrality = centrality_eigen(), community = as.factor(group_infomap()))We plot the network with the ggraph() function, that is a network oriented extension of ggplot2. The nodes and links are plotted separately with the geom_edge_* and geom_node_*. In this case link and point.
alternative with modifications to link and node attributes. Note the theme_graph() at the end.
ggraph(soc_nw, layout = "fr") +
geom_edge_link(aes(width = Weight), alpha = 0.2) +
scale_edge_width(range = c(0.5,2)) +
geom_node_point(aes(size = centrality), alpha = 0.8) +
theme_graph()final touch, let’s add the communities in the network and labels for our nodes.
ggraph(soc_nw, layout = "fr") +
geom_edge_link(aes(width = Weight), alpha = 0.2, show_guide = FALSE) +
scale_edge_width(range = c(0.5,1.5)) +
geom_node_point(aes(size = centrality, color = community), alpha = 0.8, show_guide = FALSE) +
geom_node_text(aes(label = if_else(centrality >= 0.35, name, NULL)), size = 3, repel = TRUE, show_guide = FALSE) +
scale_color_brewer(palette = "Set2") +
labs(title = "Social network of the Song of Ice and Fire books",
caption = "Data: <github.com/melaniewalsh/sample-social-network-datasets>") +
theme_graph()Two essential parts of creating a map with ggplot2: - shapefile which draws the map - some data that we want to plot over the map
Getting the map data from the maps package
We can plot the empty map
ggplot(data = world,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black") +
coord_cartesian()We can also subset the map data, just as we can with any other R object
# subsetting the world data
world_subset <- world[world$region == "France",]
ggplot(data = world_subset,
mapping = aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black") +
coord_cartesian()We can add data to our map. We subset our gapminder data for the year 1977. Then add a new row that matches the region variable in the map data so we can merge the two dataset. (we also get rid of Antarctica, because of aesthetics)
year_2000 <- gapminder_df %>%
filter(year == 1977) %>%
mutate(country = as.character(country))
# adding key variable for merge
year_2000$region <- year_2000$country
# merging the data and the map and getting rid of antarctica
world_data <- left_join(world, year_2000, by = "region") %>%
filter(region != "Antarctica")And now we can plot the map and data with the geom_polygon() and coord_quickmap(). I also made some modifications to the theme, so it looks better.
ggplot(world_data, aes(long, lat, group = group, fill = lifeExp)) +
geom_polygon(color = "gray90", size = 0.05 ) +
coord_quickmap() +
labs(fill = "Life expectancy",
title = "Life expectancy around the world",
subtitle = "1977",
caption = "Data: Gapminder") +
scale_fill_viridis_c(na.value = "white", direction = -1) +
theme_bw() +
theme(axis.line=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid=element_blank(),
panel.spacing=unit(0, "lines"),
plot.background=element_blank(),
legend.justification = c(0,0),
legend.position = c(0,0)
)The vignette contains the full tutorial on how to use the eurostat package to get data through the eurostat API. If you are interested check it out later.
For this example we are going to use the Eurobarometer data to plot trust in the EU. We select the country iso codes and the relevant item (q8a_10). Then we create a proper factor with the mutate function, as well as recode some obscure country codes that would cause problems with merging further down the line. For this we do this ugly looking nested dplyr::if_else chain. Finally we drop NAs.
eurobarometer_raw <- read_stata("data/ZA6963_v1-0-0.dta")
q8_eurobar <- eurobarometer_raw %>%
select(isocntry, qa8a_10) %>%
mutate(trust_in_eu = factor(qa8a_10, levels = c(1,2,3), labels = c("Tend to trust", "Tend not to trust", "Don't know")),
geo = if_else(isocntry == "DE-W", "DE",
if_else(isocntry == "DE-E","DE",
if_else(isocntry == "GB-GBN", "UK",
if_else(isocntry == "GB-NIR", "UK", isocntry))))) %>%
tidyr::drop_na()Then we see what is the proportion of people trusting the EU in each member state.
# share of respondent who trust the EU
q8_country_trust <- q8_eurobar %>%
group_by(geo, qa8a_10) %>%
summarise(n = sum(n())) %>%
ungroup() %>%
filter(qa8a_10 == 1)
# total number of respondents
q8_total <- q8_eurobar %>%
group_by(geo) %>%
summarise(sum = n())
# merging the two dataset to calculate the share of people trusting the EU
q8_map_merge <- left_join(q8_country_trust, q8_total, by = "geo") %>%
mutate(trust_pct = round((n / sum) * 100),
trust_cat = factor(case_when(trust_pct >= 50 ~ 1,
trust_pct < 50 ~ 0)))Finally, we download the EU shapefile via the eurostat package.
To plot the data on the map we have to combine the shapefile and the survey data with a left_join.
Now we are ready to plot! We use geom_sf() to fill the countries with our data and coord_sf to have the map projected between the given coordinates. For the color scale, let’s use the viridis scale. Altough we use the theme_minimal we can make additional changes to the theme by adding the theme() call last.
ggplot(eu_map_plot) +
geom_sf(aes(fill = trust_pct), color= "grey50", size = 0.1) +
coord_sf(xlim=c(-12,44), ylim=c(35,70)) +
scale_fill_viridis_c(na.value = "white", direction = -1, name = NULL) +
labs(title = "Trust in the European Union",
subtitle = "% of Tend to trust") +
theme_minimal() +
theme(axis.text = element_blank())If we want to add arbitrary points to the map, we can do that by specifying the longitudinal and latitudinal coordinates.
Then just plot over our map with the geom_point()
ggplot(data=eu_map_plot) +
geom_sf(fill= "white", color="dim grey", size=.1) +
geom_point(data = my_city, aes(x = long, y = lat), color = "orange", size = 5, alpha = 0.7) +
theme_light() +
coord_sf(xlim=c(-12,44), ylim=c(35,70))